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Background / Context: 

Cluster randomized experiments are ubiquitous in modem education research. Although a variety 
of modeling approaches are used to analyze these data, perhaps the most common methodology 
is a normal mixed effects model where some effects, such as the treatment effect, are regarded as 
fixed, and others, such as the effect of group random assignment or the neighborhoods where the 
students live are regarded as random. For these models, the standard reference used by education 
researchers is Raudenbush and Bryk (2002). 

Although mixed effects models enjoy wide use in estimating parameters from and testing 
hypotheses about these experiments, the development of standardized mean difference effect size 
indices for them is relatively recent. For example, effect sizes were recently defined by Hedges 
(2007) for a two-level random intercept model, by Hedges (201 1) for a three-level random 
intercept model, and Lai and Kwok (2014) for two-level cross-classified and partially cross- 
classified models. 

Purpose / Objective / Research Question / Focus of Study: 

This paper unifies the currently published effect sizes in a general mixed effects modeling 
framework. We then apply this framework to suggest an effect size for a model that currently 
lack a published effect size, a random slope model with heterogeneous treatment effects. 

Significance / Novelty of study: 

We make three contributions: 

1. We propose a framework that unifies the definition and estimation of effect sizes in mixed- 
effects models. Prior work studied only special cases of the general model. 

2. We show that the general framework effect sizes have desirable properties: namely that 
these effect sizes (a) either recover past effect sizes or are comparable with them, (b) are 
substantively interpretable, (c) have estimators that are easily computable by both primary 
and meta-analysts, and (d) these estimators have attractive technical properties such as 
consistency. 

3. We use this framework to suggest an effect size for random-slope models, a type of model 
for which there exists no previously published effect size. We note that this new, random 
slope effect size has the same metric as prior work on random intercept effect sizes, and 
further note that the new effect size estimator has substantially increased precision on 
random slope data. 

Statistical, Measurement, or Econometric Model: 

Consider the following the normal mixed effects model 

Y = Xp + Zu + e (1) 

where X and Z are fixed design matrices, p is a vector of regression coefficients for the fixed 
effects, u is a vector of random effects, and e is a vector of the residual errors. Let x be a vector 
of variance components and let the functions D [t] and R[x] map the vector of variance 
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components into variance covariance matrices for the random effect vector u and error vector e 
respectively. If we assume that u and e are independent, then 

TD[t] 0 


CM© - 


0 R[t] 


and it follows that the covariance of Y is 

2 = Var[Y] = ZD [t] Z t + R[t] . 


For example, a special case of this model is a two-level Hierarchical Linear Model with random 
slopes and intercepts such as might be used to model heterogeneous treatment effects in a cluster 
randomized trial. Following the notation of Raudenbush and Bryk (2002), let Y u be the response 
of individual i = 1 ... N in group (or cluster) j = 1 ... J so that 

Y[j — Yoo + Yoi ' TREAT) + ■ TREATj + (2) 

where y QQ is the intercept, y G1 is the average fixed effect of treatment, TREAT) = 1 for units in 
the treatment condition, TREATj — 0 for units in the control condition, the ~ N(0,Tg 0 ) are 
the random intercepts, the ~ NfO/r^) are the random slopes, the ~ N(0 ,cf 2 ) are the 
individual errors, and the only non-zero covariance terms are Cov(u ( jj J u^) = In our notation 
then, 
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where X is of order N X 2, Z is N X 2], and u is 2J X 1. Furthermore, 



where D [t] is of order 2J X 2j and I is an N X N identity matrix. 

Definition of an effect size 

In this paper, we define effect sizes in relation to hypothesis tests. We only consider hypothesis 
tests that can be expressed as a linear combination of regression coefficients being equal to zero, 

H 0 : -# T p = 0 

where ■/ is a vector. For example, H 0 : y 01 = 0, tested with a Wald t test. 

This t statistic based approach is also followed by Hedges (2007), Hedges (201 1), and Lai and 
Kwok (2014). These papers, however, only explicitly discussed the definition of effect sizes in 
the context of models where the diagonal of 2 is constant, i.e. where the diagonal of 2 makes a 
natural scale for an effect size. We extend their work to the more general mixed effects model of 
Equation (1) by defining a scale parameter for the case where the diagonal of 2 is non-constant. 


SREE Spring 2016 Conference Abstract Template 


2 


One interpretable scale parameter is the average variance of the observed units in the sample. We 
define the average variance as 

1 

u 2 = —trace [ZD [x]Z 4- R[t]] 
which for the model of Equation 2 would be 

^ ^ +T*, +T^ 1 C^(TREAT j } 2 ) + 2Tf 0 C^TREAT i ) . 

ij ij 


Note that in models where the diagonal of 2 is constant, u 2 will be equal to the diagonal. This 
measure, however, can depend on the observed values of the matrix Z, e.g. the TREATj values. 
To address this, we define the expected average variance as 


IE [cj 2 ] = -trace E[ZD[x]Z T + R[x]]j 


which, for the model of Equation 2 would be 

E p] = tr 2 +Tq(j + t 2 ! ( Va r [T RE AT j] + (E [TREAT j]) 2 ) + 2x 2 Q E[TREAT j ] 

If we further assume that the probability of assignment to treatment is p, i.e. 

TREATj ~ Bernoulli(p), then 


E 


[o 2 ] = cr 2 


We define an effect size of the linear combination ■£ 

* "p 


+ T oo + T iiP + 2 t i(}P 
as 


5 = 


which, for the model in Equation 2 would be 

Yin 


E 


5 = 


+ T oo + T iiP + 2 t iqP 


( 3 ) 


The inclusion of p in the expression for 5 is somewhat unsettling: we generally desire that effects 
size indices not depend on the particular details of a given experiment. Note, however, that the 
interpretation of 6 is as the mean difference (or, more generally, linear combination) scaled by 
the expected variation of a replication of the experiment. That is, p is a value from the future, not 
from the current experiment, per se. If, for example, we wish to consider the situation where the 
intervention is universally implemented, then the appropriate effect size has p = 1. If, however, 
we wish to consider the effect of a very small pilot implementation of the intervention, then we 
should set p = 0. The choice of p will depend on how we wish to interpret 5. 

Estimation of 6 

Estimation of 5 depends on the structure of 2 . Let 

2 = ^ ■ Vu 

so that trace [Vy] — N. For example, in a simple two-level random intercept model, 

cf 2 = cf 2 v + where o 2 v is the variation within schools, cf£ is the variation between schools. In 
this case, V 0 is the block diagonal matrix of Intra-Class Correlations (ICCs), where 
p = ct£/(cf 2 v T o£) is the ICC. 


SREE Spring 2016 Conference Abstract Template 


3 


IfVu is known a priori, e.g. p is known a priori, then Generalized Least Squares (GLS) provides 
optimal estimates of fJ. In the paper, we derive a Uniformly Minimum Variance Unbiased 
Estimator (UMVUE) of 5 and an estimator of its variance. Our approach is a direct extension of 
the arguments in Hedges (1981), where we note that, under mild regularity conditions, (i) the 
GLS t statistic is a function of complete and sufficient statistics, (ii) the non-centrality parameter 
of the GLS t statistic is a simple function of the effect size of interest, and (iii) therefore, an 
appropriately scaled GLS t statistic is an unbiased estimator composed of complete and sufficient 
statistics, i.e. via the Lehmann-Scheffe theorem, it is the UMVUE of 5. 

IfVu is not known a priori, then Restricted Maximum Likelihood (REML) has attractive 
properties. For example, the REML estimate of t is a consistent estimator, so a plug in estimator 
where t is substituted for t in the GLS estimator of 5, is also a consistent estimator under mild 
regularity conditions. In the paper, we give expressions for both the point estimate and an 
estimator of its variance. We also show that these estimators can be calculated by meta-analysts. 

Usefulness / Applicability of Method: 

The new method leads to effect sizes and estimators for a wide variety of mixed effects models. 
We show in the paper that the definition of 6 is useful because it has the following properties: 

1. 5 is interpretable: it is the observed mean difference (or linear combination) scaled by a 
measure of the expected variability of a future replication of the experiment. 

2. 5 can recover prior effect sizes such as those defined by Hedges (2007), Hedges (2011), and 
Lai and Kwok (2014). For example, Hedges (2007) considered a random intercept model 
for a group randomized experiment. He defined three effect sizes 5 t , 5 W , and 5^. We show 
that the new 5 recovers S t , 5 W , and depending on how the replication of the experiment 
is defined. If the experiment is replicated in new schools with new students, then 5 = S t ; if 
the same schools, but new students, then 5 = 5 W ; and if new schools, but the same students, 
then 5 = 5^ . 

3. 5 is on the same scale as prior effect sizes. For example, the Hedges (2007) random 
intercept effect size estimator is on the same scale as the new random slope estimator S. 
Figure 1 displays the results of a simulation where many datasets where generated from 
Equation 2 and both the random intercept £ t estimator and the slope estimator 5 are 
calculated on each dataset. Note that the means of both estimators are similar — each is 
estimating the same quantity. Note also, however, that the new estimator is far more 
efficient on random slope data, i.e. S t and 5 are comparable, but 5 is more efficient. 

Conclusions: 

The general framework we propose generates interpretable effect sizes for a wide class of mixed 
effects models. Furthermore, we illustrate the value of this framework by both recovering the 
definitions of prior effect sizes published in the literature and by deriving a new effect size for a 
cluster randomized controlled trial with heterogeneous treatment effects. 

There are, however, several open questions. Can unconditional and conditional effect sizes be 
converted between each other in mixed effects models? Are similar effect size indices defined 
from F and X' tests of hypothesis in the general model of Equation 1 similarly interpretable? 
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Figure 1 : The Hedges (2007b) random intercept effect size estimator has a wider sampling 
distribution on random slope data than the new, proposed random slope estimator £. Both 
sampling distributions, however, have very similar means. 
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